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Abstract 



Steganographic protocols enables one to "embed" covert messages into inconspicuous data over a 
public communication channel in such a way that no one, aside from the sender and the intended receiver 
can even detect the presence of the secret message. In this paper, we provide a new provably-secure, 
private-key steganographic encryption protocol. We prove the security of our protocol in the complexity- 
theoretic framework where security is quantified as the advantage (compared to a random guess) that 
the adversary has in distinguishing between innocent covertext and stegotext that embeds a message of 
his choice. The fundamental building block of our steganographic encryption protocol is a "one-time 
stegosystem" that allows two parties to transmit messages of length at most that of the shared key with 
information-theoretic security guarantees. The employment of a pseudorandom generator (PRG) permits 
secure transmission of longer messages in the same way that such a generator allows the use of one-time 
pad encryption for messages longer than the key in symmetric encryption. In this paper, we initiate 
the study of employing randomness extractors in a steganographic protocol construction to embed secret 
messages over the channel. To the best of our knowledge this is the first time randomness extractors 
have been applied in steganography. 

Keywords: Information hiding, steganography, data hiding, steganalysis, covert communication. 

1 Introduction 

The steganographic communication problem can be best described using Simmons [ 3] formulation of the 
problem - In this scenario, prisoners Alice and Bob wish to communicate securely in the presence of an ad- 
versary, called the "Warden," who monitors whether they exchange "conspicuous" messages. In particular, 
Alice and Bob may exchange messages that adhere to a certain channel distributions that represents "incon- 
spicuous" communication. By controlling the messages that are transmitted over such a channel, Alice and 
Bob may exchange messages that cannot be detected by the Warden. There have been two approaches in for- 
malizing this problem, one based on information theory [1, 15, 5] and one based on complexity theory [3, 4]. 
The latter approach is more concrete and has the potential of allowing more efficient constructions. Most 
steganographic constructions supported by provable security guarantees are instantiations of the following 
basic procedure (often referred to as "rejection-sampling"). 

The problem specifies a family of message distributions (the "channel distributions") that provide a 
number of possible options for a so-called "covertext" to be transmitted. Additionally, the sender and the 
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receiver possess some sort of private information (typically a keyed hash function, MAC, or other similar 
function) that maps channel messages to a single bit. In order to send a message bit m, the sender draws a 
covertext from the channel distribution, applies the function to the covertext and checks whether it happens 
to produce the "stegotcxt" m he originally wished to transmit. If this is the case, the covertext is transmitted. 
In case of failure, this procedure is repeated. While this is a fairly concrete procedure, there are a number 
of choices to be made with both practical and theoretical significance. From the security viewpoint, one is 
primarily interested in the choice of the function that is shared between the sender and the receiver. From a 
practical viewpoint, one is primarily interested in how the channel is implemented and whether it conforms 
to the various constraints that are imposed on it by the steganographic protocol specifications (e.g., are 
independent draws from the channel allowed? does the channel remember previous draws? etc.). 

Our model differs from the traditional approach to steganography where the sender modifies a covertext 
that is known to the adversary in an effort to embed secret data. Such an approach is secure only against 
adversaries with limited detection capability. This approach is found, for instance, in several software 
applications which manipulate certain pixels of visual images to embed hidden information. While such 
minor perturbations to an image may be imperceptible to the human eye, it is trivially discerned by an 
algorithm with access to the original cover image. 

As mentioned above, the security of a stcgosystem can be naturally phrased in information-theoretic 
terms (cf. [1]) or in complexity-theoretic terms [■!]. Informally, the latter approach considers the following 
experiment for the warden-adversary: The adversary selects a message to be embedded and receives either 
covertexts that embed the message or covertexts simply drawn from the channel distribution (without any 
embedding). The adversary is then asked to distinguish between the two cases. Clearly, if the probability 
of success is very close to 1/2 it is natural to claim that the stegosystem provides security against such 
(eavesdropping) adversarial activity. Formulation of stronger attacks (such as active attacks) is also possible. 

Given the above framework, Hopper et al. [3] provided a provably secure stegosystem that pairs rejection 
sampling with a pseudorandom function family. In this article we take an alternative approach to the 
design of provably secure stegosystems. Our main contribution is the design of a building block that we 
call a one-time stegosystem: this is a steganographic protocol that is meant to be used for a single message 
transmission and is proven secure in an information-theoretic sense, provided that the key that is shared 
between the sender and the receiver is of sufficient length. In particular we show that we can securely 
transmit a v bit message with a secret key of length v\ Our basic building block is a natural analogue 
of a one time-pad for steganography. It is based on the rejection sampling technique outlined above in 
combination with randomness extractors. To the best of our knowledge, this is the first time randomness 
extractors have been employed in the design of steganographic protocols. Given a one-time stegosystem, it 
is fairly straightforward to construct provably secure steganographic encryption for longer messages by using 
a pseudorandom generator (PRG) to stretch a random seed that is shared by the sender and the receiver to 
sufficient length. The resulting stegosystem is provably secure in the computational model. 

2 Definitions and Tools 
2.1 Preliminaries 

We say that a function /i : N — > K is negligible if for every positive polynomial p{-), there exists an N 
such that for all n > N, /i(n) < We use the notation x <— X to denote sampling an element x from 

a distribution X and the notation x S to denote sampling an element x uniformly at random from a 
set S. For a function / and a distribution X on its domain, f(X) denotes the distribution of sampling x 
from X and applying / to x. The uniform distribution on {0, l} d is denoted by Ua and U(X) denotes the 
uniform distribution on a finite set X. We denote the length (in bits) of a string or integer s by \s\ and the 
cardinality of a set S is denoted by \S\. The concatenation of string s± and string S2 is denoted by s\ o S2- 
"log" indicates the logarithm base 2 and "In" denotes the natural logarithm. For completeness, we record 
below a few inequalities we use. 
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Theorem 1 (Markov's inequality). Let X be a random variable that takes only non-negative real values. 
Then for every a > 0, we have 

Pr[X >a]< E[X]/a. 

Theorem 2 (Law of Total Probability). Let A and B be events in a probability space f2, and < Pr[B] < 1. 
Then, 

Pr[A] = Pr[A\B] ■ Pr[B] + Pr[A\B] ■ Pr[B}. 

Theorem 3 (Boole's inequality). Let Ax,A%, ■ ■ ■ , A m be a countable set of events in a probability space fl. 
Then, 

m m 

Pr[|J^]<E Pr ^]- 

i=l i=l 

2.2 e-biased functions 

Definition 1 ([14]). Let P be a distribution with a finite support X . A function f : X — > Y is e-biased if 

\Pr x ^ P [f(x)=y]-l/\Y\ \<e V yeY. 

We say that f is unbiased if f is e-biased for e a negligible function of the appropriate security parameter 
and finally f is said to be perfectly unbiased if 

\Pr x ^ P [f(x)=y]-l/\Y\ |=0 V y€Y. 

2.3 min-entropy 

We use min-entropy to quantify how much randomness is contained in a probability distribution. The min- 
entropy of a distribution is a variant of the Shannon entropy which measures the amount of randomness in 
the worst-case as opposed to Shannon entropy which measures the expected amount of randomness in the 
distribution. Intuitively a distribution with min-entropy k contains k random bits. A distribution X is said 
to have min-entropy of at least k bits if the probability it assigns to each element in its range is bounded 
above by 2~ k . A distribution with min-entropy at least k is called a /e-source. 

Definition 2. The min-entropy of a random variable X , taking values in a set V , is the quantity 

Hoo(X) = min (- log 2 Pr[A = v}) . 

2.4 Statistical Distance 

We use statistical distance as the measure of distance between two random variables. 

Definition 3. Let X and Y be random variables which both take values in a finite set S with probability 
distributions Px and Py ■ The statistical distance between X and Y is defined as 

A [X, Y] := 1\\P X - Py]], = \ J2 l P *( s ) - P r^\ ■ 

" s£S 

We say that X andY are e-close if A[X,Y] < e. In other words, X and Y are e- close if \Px(S') — Py(S')\ <e 
for every event S'CS. 

The statistical distance is the largest possible difference between the probabilities that the two probability 
distributions can assign to the same event. If the statistical distance between two random variables is small, 
then no probabilistic algorithm can distinguish between them without sampling a large amount of data. We 
will use the following properties of statistical distance which follow directly from the definition. 
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Theorem 4. Let X , Y and Z be random variables taking values in a finite set S. We have 

1. 0< A[X,Y] < 1 

2. A [X, Z] < A [X, Y]+ A [Y, Z] (triangle inequality) 

In this paper, we often use statistical distance as the measure of distance between two probability distri- 
butions described by the random variables. We also use the terms statistical distance and distance in total 
variation interchangeably to mean the same measure. 

Theorem 5 ([12]). Let X and Y be random variables which take values in a finite set S with probability 
distributions Px and Py ■ For every set ft C ft 

A[X,Y] > \P X (S')-P Y (S')\, 

and equality holds for some ft C ft and in particular, for the set 

S' :={seS:P x (s)<P Y (s)}, 

as well as its complement. 

Proof. Let us partition the set S into two disjoint subsets ft and ft as defined below. 

ft = {s G S\P x (s) < Py(s)} and the set ft = {s e S\P x (s) > P Y (s)} 
Since Px and Py are probability distributions, 

J2 Px(s) + E p x (s) = J2 p y( s ) + E p y^ = 1 

sgSo seSi s&So sSSi 



This implies that, 



ses ses ses! seSi 



P x (s) - Py(s) 

ses 



Now, 



A[X,Y] = lj2\Px(s)-Py(s)\ 



\ E \ p ^ s ) - p y( s )\ + \ E \ p ^ s ) p y( s )\ 

£ \Px(s)-Py(s)\ or £ \P x (s)-Py(s)\ 

seS seSi 



For any subset S' of S 



Px(S') - Py(S') 



p x {S' n So) + p x {s' n Si) - (P Y {S' n s ) + p y (s' n ft)) 
(Px(S' n ft) - P Y (S' n ft)) - (Py(ft n ft) - P X (S' n ft)) 



So, we can observe that \Px(S') — Py(S')\ is maximized when ft = ft or ft = ft, in which case it is 
equal to A [X, Y}. When A [X, Y] is small, then there is no statistical test that can effectively distinguish 
between the distributions of X and Y. □ 
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Theorem 6 ([12]). If S and T are finite sets, X and Y are random variables taking values in the set S with 
probability distributions Px and Py , and f : S — ► T is a function, then A [f(X), f(Y)] < A [X, Y]. 

Proof. We know from Theorem 5 that 

A[/(X),/(Y)] = \P f(x) (T') - P nY) (T')\ for some T' C T 

= \Px(r l {r))-p Y {r l {T'))\ 

< A [X, Y] . 

where the final inequality follows again from Theorem 5. □ 
See [12] for further discussions on statistical distance and their properties. 



2.5 Randomness Extractors 

Randomness extractors are deterministic functions that operate on arbitrary distributions with sufficient 
randomness and ouput "almost" uniformly distributed, independent random bits. Extractors require an 
additional input: a short seed of truly random bits as a catalyst to "extract" randomness from such distribu- 
tions, i.e., the input of an extractor contains two independent sources of randomness: the actual distribution 
(the source) and the seed. 

Definition 4. A (fc, e)- extractor is a function 

Ext : {0,1}" x {0,l} d ^ {0,l} m 

such that for every distribution X on {0, 1}" with H^i^X) > k, the distribution Ext(X,Ud) is e-close to the 
uniform distribution on {0, l} m . 

For our application, we require a stronger property from the extractor. We need the output of the 
extractor to remain uniform given the knowledge of the seed used. In other words, we require the extractor 
to extract randomness only from the source and not from the seed. A way of enforcing this condition is 
to demand that when the seed is concatenated to the output, the resulting distribution is still e-close to 
uniform. Such an extractor is called a strong extractor to distinguish from the non-strong extractors defined 
above. Non-strong Extractors guarantee to extract randomness from fc-sources on an average seed while 
strong extractors guarantee to extract randomness for most seeds. In this paper, we use the term extractor 
to refer to a strong extractor. Extractors (strong) were first defined by Nisan and Zuckerman [2]. 

Definition 5. A (fc, e)-strong extractor is a function 

Ext : {0,1}" x {0,l} d ^ {0,l} m 

such that for every distribution X on {0, 1}" with H oc (X) > k, the distribution Ud° Ext(X, Ud) is e-close 
to the uniform distribution on {0, l} m + rf . 

We refer to n as the length of the source, k as the min- entropy threshold and to e as the error of 
the extractor, the ratio k/n as the entropy rate of the source X and to the ratio m/k as the fraction of 
randomness extracted by the extractor. The entropy loss of the extractor is defined as k + d — m. The two 
inputs of the extractor have joint min-entropy of at least k + d and the entropy loss measures how much of 
this randomness was "lost" in the extraction process. Radhakrishnan and Ta-Shma [S] showed that every 
non-trivial (k, e)-extractor cannot extract all the randomness present in its inputs and suffers an entropy 
loss of x = 2Zog(l/e) + O(l). For our application, we need efficient, explicit strong extractor constructions 
as defined below. 
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Definition 6 ([11]). For functions k(n), e{n), d(n), m(n) a family Ext = {Ext n } of functions 

Ext n : {0, 1}" x {0, l} d(n) -» {0, l} m(n) 

is an explicit (fc, e)-strong extractor if Ext{x,y) can be computed in polynomial time in its input length 
poly(n,d(n)) and for every n, Ext n is a (k(n), e(n))- extractor. 

An important property of strong extractors which makes it attractive for our application is that for any 
fc-sourcc, a (1 — e) fraction of the seeds extract randomness from that source. The following theorem asserts 
this statement formally and follows directly from the definition of a strong extractor. 

Theorem 7 ([10]). Let Ext be a (k,e)-strong extractor, Ext : {0,1}" x {0, l} d -> {0, l} m such that for 
every distribution X on {0, 1}" with H X (X) > k, the distribution Ud ° Ext(X, Ud) is e-close to the uniform 
distribution on {0, \} m + d . Let s <Er {0, l} d . Then, 

Pr s [A [Ext(X, s), U m ] >^fe]<V~e 

Proof. From the definition of a (k, e)-strong extractor we know that A [Ud ° Ext{X, Ud), U m +d] < £• By the 
definition of statistical distance, for uj € {0, l} m and seed S E {0, l} d , this can be written as 



is s!W Pr [S = s A Ext(X, s) = uj] - 2- {m+d ^ 
Pr [S = s A Ext(X, s) = w] - 2-^ n+d ^ 
The expectation can then be obtained as 

E s [A [Ext(X,s),U m ]]<e. 
We now invoke Markov's inequality from Theorem 1 to conclude that 

Pr s [A [Ext{X,s),U m ] > Vi] < y/l 



< e 
It. 



□ 



See the survey papers [11, 6, 7] for more details on extractors and their properties. In this paper, we use 
the explicit strong extractor construction by Raz, Reingold and Vadhan [9] which works on sources of any 
min-entropy on strings of length n. It extracts all the min-entropy using 0(log 3 n) additional random seed 
bits while achieving an optimal entropy loss (up to an additive constant) of \ = 21og(l/e) + 0(1) bits. 

Theorem 8 (RRV Extractor [9]). For every n, k e N, and e > such that k < n, there are explicit 
(k, e)-strong extractors 

Ext : {0, 1}" x {0, l} d -* {0, l} fe -* 



with entropy loss 

and requires seeds of length 



X = 21og(l/e) + 0(l) bits 
d = <3(log 2 n ■ log(l/e) • log A;) bits. 
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2.6 Channel 



The security of a steganography protocol is measured by the adversary's ability to distinguish between 
"normal" and "covert" messages over a communication channel. To characterize normal communication we 
need to define and formalize the communication channel. We follow the standard terminology used in the 
literature [3, 1, 14] to define communication channels. 

We let £ = {(7i, . . . , a s } denote the symbols of an alphabet and treat the channel, which will be used for 
data transmission, as a family of random variables C = {Ch}h£Y.*', each Ch is supported on S. These channel 
distributions model a history-dependent notion of channel data: if hi, hi, ■ ■ . , hi have been sent along the 
channel thus far, Ch x ,...,hi determines the distribution of the next channel element. 

We let Ch denote the marginal channel distribution on a single symbol from S conditioned on the history h 
of already drawn symbols and C h denotes the marginal distribution on sequences of t symbols conditioned on 
history h. This definition of a channel differs from the typical setting where every symbol from the alphabet 
is drawn independently according to some fixed distribution. This definition captures the adaptive nature of 
the channel by making explicit the dependence between the symbols typical in real world communications. 

We assume that the channel satisfies a min-entropy constraint for all histories. This assumption is 
important and reasonable since without this assumption it is not possible to maintain positive information 
content in communications. In particular, we require that 6 has min-entropy 5, so that V7i G £*, H^Ch) > S. 
Observe that this definition implies that Hoo(C h ) > St. 

2.7 Stegosystem 

Definition 7. A one-time stegosystem consists of three probabilistic polynomial time algorithms 

S = (SK, SE, SD) 

where: 

• SK is the key generation algorithm; we write SK (l v , log(l/e sec )) = k. It takes as input, the security 
parameter e sec and the length of the message v and produces a key k of length k. (We typically assume 
that k = k(u) is a monotonically increasing function of v.) 

• SE is the embedding procedure, which can access the channel; SE (l v ,k,fh,h) = c s t eg o G It takes 
as input the length of the message v , the key k, a message m G M„ = {0, l} y to be embedded, and the 
history h of previously drawn covertexts. The output is the stegotext c ste go G 

• SD is the extraction procedure; SD(l v ,k, cG £*) = m or fail. It takes as input V, k, and some 
ceE*. The output is a message m or the token fail. 

Recall that the min-entropy of a random variable X, taking values in a set V, is the quantity 

H^X) = min(-logPr[X = v}) . 

We say that a channel 6 has min-entropy 5 if for all h £ £*, H^Ch) > S. 

Definition 8 (Soundness). A stegosystem S — (SK,SE,SD) is said to be <5)-sound provided that for 

all channels C of min-entropy S, 

Vm G M u ,PT[SD(l l/ ,k,SE(l K ,k,m,h)) ^ m \ k <- SK(V,\og(l/e aec ))] < i/)(k). 

One-time stegosystem security is based on the indistinguishability between a transmission that contains 
a stcganographically embedded message and a transmission that contains no embedded messages. An ad- 
versary A against a one-time stegosystem S = (SK, SE, SD) is a pair of algorithms A = (SAi, SA2), that 
plays the following game, denoted G A (l n ): 

1. A key k is generated by 5ii"(l",log(l/e sec )). 
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2. Algorithm SAi receives as input the length of the message v and outputs a triple (rh*, 9, h c ) £ M v x 
{0, 1}* * £*, where 9 is some additional information that will be passed to SA 2 . S'Ai is provided access 
to C via an oracle 0(h), which takes the history h as input. O(-), on input h, returns to SAi an element 
c selected according to Ch- 

3. A bit b is chosen uniformly at random. 

• If b = let c* «— SE(V , k, m* , h), so c* is a stegotcxt. 

• If 6 = 1 let c* — C\ o • • • o c,\, where o denotes string concatenation and c^< C/iocicwoq-i' 

4. The input for 5A 2 is V , h c , c* and 9. SA 2 outputs a bit b' . If b' = b then we say that (SAi, SA 2 ) 
succeeded and write G A (\ U ) = success. 

The advantage of the adversary A over a stegosystem S 1 is defined as: 

1 



Adv^(^) 



Pr [G(V) = success] 



The probability includes the coin tosses of A and SE, as well as the coin tosses of G(1 K ). The (information- 
theoretic) insecurity of the stegosystem is defined as 

InSec s (j/) = max{Advg (V)} , 
this maximum taken over all (time unbounded) adversaries A. 

Definition 9. (Security) We say that a stegosystem S = {SK, SE, SD) is (4>{y), <5)-secure if for all channels 
with min-entropy 6 we have InSecs(^) < 



2.8 Rejection Sampling 

A common method used in stcganography employing a channel distribution is that of rejection sampling (cf. 
[1, 3]). We use a variant of rejection sampling to transmit bit vectors as opposed to a single bit. Assuming 
that one wishes to transmit a bit vector m £ {0, 1} V and employs a random function / : {0, l} 9 x S* — > {0, 1} V 
that is secret from the adversary, one performs the following "rejection sampling" process: 



rejsam£(m, p) 


let j = 




repeat : 




sample c 


<— Cf t , increment j 


until /(c) = m 


or > p) 


output : c 





For a given history /i, the procedure rej sam£(m, p) draws independent samples from the channel distribution 
C l h in rounds until /(c) = rh or (j > p). As there arc at most a total of p + 1 rounds, if none of the first p 
samples drawn map to the target bit vector, the sample drawn at round p + 1 is returned by the procedure. 
Here, as defined before, £ denotes the output alphabet of the channel, h denotes the history of the channel 
data at the start of the process, and C l h denotes the marginal distribution on sequences of t symbols, £*, 
given by the channel after history h. The receiver (also privy to the function /) applies the function to 
the received message c £ £' and recovers rfi with a non-negligible success probability. The sender and the 
receiver may employ a joint state denoted by q in the above process (e.g., a counter), that need not be secret 
from the adversary. Note that the above process performs p + 1 draws from the channel with the same 
history. These draws are assumed to be independent. One basic property of rejection sampling that we use 
is: 
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Lemma 9 ([14]). If the function f is e-biased on C l h for history h, then for any p, fh £r {0, 1} V : 



rejsarrf h 



anf h (fh,p),C t 



< e. 



Proof. Let us denote the samples drawn by the procedure rejsam£(m, p) as c\,i = 1, • • • , p + 1. Suppose 
the target bit vector fh was chosen with the probability Pr[/(C / t i ) = fh], we first show that the output from 
rejsam^(m, p) is distributed identically to C l h . For simplicity of notation define p c rn = Pr[/(c) = fh] for 
c£ C l h . p c denotes the probability of drawing c from the channel distribution C l h , i.e., p c = Pr^/^^t [c' = c\. 
The probability of observing c £ C l h under the rej sanrj(m, p) procedure is then given by 



Pr[rejsam{(m, p) = c] = Pr^ [Si = c] ■ Pr[/(ci) = fh] + Pr^ [c 2 = c] • Pr[/(c 2 ) = m] • Pr[/(ci) 7^ m] 

+Pr tf ct [c 3 = 5] • Pr[/(c 3 ) = m] • Pr[/(ci) ^ m A /(c 2 ) ^ m] + ■ ■ ■ 

= PcP™ + PcP C m (1 - P™) + PcP C m (1 - P™) 2 + PcP C m (1 - P™) 3 

+ • • • + VcP C m (1 - Pm)^ + (1 - P™) P 
= PcP™ (l + (1 - P™) + (1 - Prnf + (1 " P™)" + • ' ' + (1 " P^)"" ') + Pc (1 - P^) P 

= pcp^Y 1 ' (1 : kjp )+pc(i-p^) p 

= Pc 

But since A [fh, U v ] < e, it must be the case that A rejsam^m, p), C rt h < e by Theorem 6. □ 

3 The construction 

In this section we outline our construction of a one-time stegosystem as an interaction between Alice (the 
sender) and Bob (the receiver). Alice and Bob wish to communicate over a channel with distribution C. For 
simplicity, we assume that the support of C/j is of size |S| = 2 b . We also assume that C has min-entropy 
5, so that V7i € E*, H^Ch) > 8 and by the additive property of min-entropy, iJoo(C^) > St, i.e., Ch is a 
(5-source and C£ is a St-souice. 

3.1 A one-time stegosystem 

Fix an alphabet E for the channel, choose a message fh' € {0, l} 1 ' and the security parameter e sec . Our 

stegosystem uses the RRV strong-extractor construction as described in Theorem 8 which extracts random- 
it 

Section 2.8. 

Alice and Bob agree on the following: 



ness from the marginal channel distribution C l h supported on {0, l}*' b by rejection sampling as described in 



Extractor Construction. Alice and Bob agree to use the explicit RRV strong-extractor construction as 
described in Theorem 8. They use a shared seed s £r {0, l} d . The notation E s = E s (-) stands for the 
extractor E(-, ■) used with the seed s i.e., E(-, s). 

One-Time Pad. Alice and Bob use a shared secret key k OTP £r {0, 1}" to randomize their message. 

Key generation consists of selecting the seed s Eji {0, l} d and the one-time pad secret key k OTP £r {0, 1}". 
For our protocol, the shared seed need not be secret. The encoding procedure accepts an input message and 
outputs stego text of length A. We will analyze the stegosystem below in terms of arbitrary parameters t, d, 
A, c and e sec relegating discussion of how these parameters determine the overall efficiency of the system to 
Section 3.4. 
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Alice and Bob then communicate using the algorithm SE for embedding and SD for extracting described 
in Figure 1. In SE, after applying the one-time pad, we use rejsam^ s (mi, p) to obtain an element Cj of the 
channel for each block m, of the message. Here, the history h represents the current history at the time of 
rejection sampling which gets updated after sampling. The resulting stcgotcxt C\c\ ■ ■ ■ Civ/ciogi/l is denoted 
Cstego- In SD the received stegotext is parsed block by block by evaluating the extractor using seed s; this 
results in a message block. After performing this for each received block, a message of size v is received, 
which is subjected to the one-time pad decoding to obtain the original message. 



PROCEDURE SE(k,m',h): 


PROCEDURE SD (k, c stego ): 


Input: Key k — (k OTP , s) ; hidden text m'e{0,l}' y , 

history h G X* 
let fh = k OTP © fh' 
parse fh as fh = fhyrhi . . . fhr v / c i ogv -\ 
for i = 1 to |V/clogz/| { 
Si ^rejsam^ s (mi,p) 
set h <— ho Si 

} 

Output : C s tego = CiC 2 . . . C^/dog v\ G ^ A 


Input: Keyfc=(fc OTP , s) 
stegotext c s tego 

parse c steg o as c = C\ o c 2 . . . cjv/ c i ogl /| 
for i = 1 to [^/clogz^] do { 
set to, = S s (ci) 

} 

set m = jnim 2 ...m|- ! ,/ c i ogl /| 
Output : m © fc OTP 



Figure 1: Encryption and Decryption algorithms for the one-time stegosystem of 3.1. 



The detailed security and correctness analysis follow in the next two sections. 
3.2 Security 

In this section we argue about the security of our one-time stegosystem. We wish to quantify the security 
of our stegosystem by the statistical distance between the "normal" and "covert" message distributions over 
the communication channel. First, by Lemma 9, observe that if the function / is e-biased on C l h for history h, 
then for any p, fh {0, l} n : A[rejsam^(TO, p), C/j] < e. It follows from the definition of a strong-extractor, 
Theorem 7, that for an uniformly chosen seed s Gr {0, l} d , Pr s [A [Ext(X, s), U m ] > °Je\ < s/e. So, 

Abrejsamf (m, < (1-^.^ + ^-1 

= 2y/i-e 
< 2Ve. 

Suppose in our stegosystem construction, [Figure 1], we had used an independent and uniformly chosen 
seed Si £ {0, l} d for each message block i = 1,2, • • • , |V/clogi/| , the statistical distance between the natural 
channel distribution and the output of procedure SE can be given by 

A [SE(k, fh', h), Cfi\ < 2\fk\v / clogf] by the triangle inequality. 

Next, we present an upper bound on the statistical distance between the "natural" channel distribution 
and the output of the encoding procedure SE when using a single seed s €r {0, l} d over all the message 
blocks as in our construction. 

We need the following technical lemmas to prove the results of this section. 

Lemma 10. Consider random variables X, X' € X and Yx,Y^, € y for each x £ X. Then, 

A [(X, Y x ) , (X', Y^,)] < A [X, X'] + A [Y X ,Y^] . 
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Proof. For x E X denote Pr[X = x] by P x and Pr[l^ = y] by P y \ x - Then we get 
A{(X,Y X ),(X'Y X ,)} = l - Y, \P*-Py\*-K-P'v\*\ 

^ 5 E l P - ' P y\x ~ P x-Py\A + \Y. \ P 'x ■ P y\* ~ P * ' P 'v\*\ ( trian S lc inequality) 

x.y x.y 
= n^2 P y\x ' \ P x ~ P x\+ n^2 P x -\Py\x - Py\ x 



x.y 



x.y 



= A[X,X'] + A[Y X ,Y X ] 



□ 



Lemma 11. Consider random variables X, X' € X and Yx, Y' x , G y /or eac/i x € X. Then for any constant 
P>0, 

A [Y X ,Y X ] < Pr x [A [Y X ,Y X ] >0\-l + Pr x [A [Fx, Fj^] <0\-0 . 

Theorem 12. For a message m! 6 {0,1}, £/ie insecurity of the stegosystem S of Section 3.1 is bound 
by £■ (y/e sec + 2^/e sec ) , i.e., InSecs(V) < I ■ (y / e sec + 2^/e sec ) , where e sec is the security parameter and 
I = \v J c log v\ for some constant c. 

Proof. We start the encoding procedure SE with history h which embeds message blocks into the channel 
adaptivcly using rejection sampling. We want to show that the statistical distance between the output of 
SE and the natural channel distribution is given by 

A [SEQe, m', h),C£] < I ■ (V~e sec + 2</w) 
where £ = |V/clogz/| for some constant c and A is the length of the output by procedure SE. 
First, we define some notation to capture the operation of the procedure SE. Let C v denote the channel 
distribution Ch at depth 0. Let C\ denote the distribution at depth 1 that results by sampling c\ <— C v ; C 2 
denotes the distribution at depth 2 that results by sampling ci <— C v and c 2 <— C Cl . Similarly, let C T denote 
the channel distribution at depth r that results by sampling c\ <— C v , C2 <— C Cl ,■ • ■ ,c T <— C Ci0C20 ... 0Ct _ 1 . We 
define the random variables obtained by rejection sampling in the same fashion. Let us now formally define 
these families of random variables Ci, Ri at depth i, i = 0, 1, • • • ,£. Here, m = k' 



OTP 



and 



= c log v for i = 1 , • • • , £. 



C 2 4 



C h 

Ch 0Cl where Ci <— C h 
CWioca where c 2 <— C/ ioCl 



C r — Chocioc20---oc T -ioc T where c r < Cji OCl0 ...oc T —i 



and 



i? 



Ri 
R 2 



C h 

E ( ) 

Cho Cl where ci <— rejsam^ 3 ^ (mi,p) 
Ch.o ClOC2 where c 2 <- rejsam^Y (m 2 ,p) 



-Rr = C fc o Cl oc 2 o...o CT _ l0 c T where ^ ^ re jsamf * c Y ... 0Ct _ i (m T ,p) 
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Now, pick a seed s Gr {0, 1} . First, we show that at each depth r, t = 1, • • • ,£, the probability mass of 
distributions for which the extractor coupled with the seed s yields an y^-biased function is large. 

We say that a distribution C c1iCJi ... iCt is (s, \/e sec ) -good if E s = E s (-) = E(-,s) is v^^-biased on C. 
Otherwise we say that the distribution C is (s, ^/e sec )-bad. 

Let us define the following sets: 



G l = {(ci,c 2 , • ■ ■ ,c T ) |C C1) 



r is (s, y/e aec ) -good} 



and 



#1 = {(ci,c 2 ,--- ,Cr) |C cl ,c 2l -,cvis (s,\/e sec )-bad} . 

, denote the collection of (s, \/e sec ) -good and (s, -bad distributions at depth r respectively. 

The basic property of a strong-extractor construction that we use here is that for any distribution C with 
the right min-entropy, the probability over the choice of the seed s, Pr s [C is (s, y/t sec ) -good] > 1 — \/e sec 
by Theorem 7. This implies that Pr s [C is (s, y/e sec ) -bad] < y/e sec - 

Define the quantity fi (-BJ) = Ecgb t Pr[Observing c under the natural channel distribution] = Eass^ Prc h 
with the appropriate history h. This quantity /i (-BJ) represents the probability mass of the set B T S . Define 

V ( G l) = Ecggj Pr c;[c] similarly. 

Define the indicator random variable 

s 1 Otherwise. 
The expected value E [X£\ = 1 ■ Pr [C c - is (s, yi sec )-good] > 1 - 

Now, notice that the expected mass of Gl over the choice of randomly picking the seed s is given by, 



E s [fi (G T S )] = E s 



cGCV 



= E Prcj[c1-E[X|]>l-^ s 

cGCV 



since Ee?eGj Pr c;H = 1- 
Consequently, 



[n(Bl)}<V~e s 



We now want to compute the probability that this expected value is small. Using Markov's inequality 
from Theorem 1 we get 



Pr s [/x {BD >a] <E S [^{B T s )]/a 



When a = ^ 4 /e777, we get 



Pr s [ f i(B T s ) > $e^\ < $T S 
By Boole's inequality from Theorem 3 we get 



Pr s [3r \fx(B T s ) > ^ c \<l^ s 



So, 



Pr s [Vr |/i (Gl) > 1 - ^Tc] > 1 - ^ 
where £ = |V/clogi/|, the number of message blocks. 
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So, we can see that if we pick a seed s uniformly at random, with probability 1 — £^/e sec in the choice of 
s, Vr, fi (-BJ) is small which implies that /j, (GJ) is large, i.e., Vr, fi (GJ) > 1 — -^e sec . Here r = 1, 2, • • • , I. 

We say that a seed s is good if Vr, // (G£ ) > 1 — ^/e sec . From the discussion above we know that 

Pr s(Efl{0il }d [s is good] > 1 - ^^/e^ . 
Now, fix a good seed s. We will now prove that for a good seed s, 

A [(Gi, G 2 , • • • , Ci) , (Ri,R 2 , • • • , Ri)} < £ ■ {V~e sec + <feZ) ■ 

We prove this by induction on £, the number of message blocks. 
Base Case: for r = 1 : A [C lt Ri] < ^sec < \^ sec + tftTZ- 

Inductive hypothesis: A [(Gi,G 2 , • • • , G r ) , (Ri,R 2 , • • • , R T )\ < r ■ (y/e aec + tfe^). 
To show: 



A[(C U C 2) --- ,C T+1 ),(R 1 ,R 2 ,--- ,Rr +1 )] < (T + l)-(V~e sec + ^) 
Observe that, 



A[(Gi,G 2 ,-- - ,C T+1 ),{R 1 ,R 2: --- ,Rr+i)] < A[(Gi,G 2) --- , C T ) , (Ri, R 2 , ■ ■ ■ ,R T )} 

+A [(Ci, G 2 , • ■ ■ , C T , G r+1 ) , (Gr, G 2 , ■ • ■ , C T ,R T+1 )} by Lemma 10 

< t ■ (V~e sec + ^w) + A [{C U C 2 , ■■■ ,C T , G T+ i) , (G 1; G 2 , • • • , G r , i? T 

< r • (v^sec + v^) + (1 - \/^) ■ V^sec + ' 1 by Lemma 11 

< T ■ (V~t sec + + (V^sec + V^) 

< (r + 1) ■ (^ sec + ^3 
Hence we can conclude that for a good seed s, 

A[(G 1; G 2 ,--- ,C i ),(R 1 ,R 2 ,--- ,R £ )]<l-(V~e sec + <fe^) 
So, the statistical distance is now given by, 



A[(Gi,C 2 ,-- - , Ci) , (R\, R 2 , 



,Rt)] = A[(Gx,G 2 
A[(Gx,G 2 



< 
< 



, Ci) , (i?i, i? 2 , • • • , i^)] | s i s good ■ Pr[s is good] + 
, G^) , (Rt,R 2 , • • • , i?^)] | s is „ot good ■ Pr[s is not good] 

* ' (V^sec + • (1 - + 1 ■ 



Thus, 



A m', h), C£] < I ■ (v^sec + 2-^Csec) where € = |V/clogi>] for some constant c. 

and the theorem follows by the definition of insecurity. 



□ 
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3.3 Correctness 



In this section we obtain an upper bound on the soundness of our stegosystem. We focus on the mapping 
between {0, 1}" and S A determined by the SE procedure of the one-time stegosystem. We would like to 
bound the probability of the stego decoding procedure's inability to faithfully recover the encoded message. 
Following the definition of soundess from Section 2.7 we seek to upper bound this probability of failure: 

Pr[SD{l",k,SE(l K ,k,m,h)) | k <- SK{Y , log(l/e sec ))], Vm e M„. 

For simplicity of notation, let F be the event that Bob is unable to correctly decode the message sent 
from Alice, i.e., SD(1", k, SE(1 K , k, rh, h)) ^ fn \ k <— SK{V ', log(l/e sec )). Since m = mi o rh 2 o • ■ ■ o m e 
and \rhi\ = clogv for i = 1, • • • , £, let us first estimate the probability of failure for one message block 
denoted by F' . We reuse the notations and definitions introduced in the security proof of the section above. 
Recall that a seed s is good if Vt fj, (GJ) > 1 — tye sec . As shown before, the probability of seed s to be good 
is given by Pr s [Vt \h (GJ) > 1 - ^e] > 1 - This gives us 

Pr[F] = £ ■ (Pr[F' | s is good] • Pr[s is good] + Pr[F' | s is not good] • Pr[s is not good]) 
< I ■ (Pi[F' I s is good] • 1 + 1 • 

where £ = v j 'clog v. Let us now compute the quantity Pv[F' \ s is good]. We know that when the seed s is 
good, the extractor coupled with the seed s is an \/e sec — biased function with probability at least 1 — tfe sec . 
So, we get 



Pr[F' | s is good] < (1 - $e^) ■ 1 



1 



2|m«| 



where p is the bound on the number of iterations performed by the rejection sampling procedure. When 
and p = 2-\fhi\- 2l" l *l we get 



Pv[F' 1 s is good] < (1 - #e^) 



2l™il 



In our construction, we have each \rh,j\ = c\ogv. So, e sec = - 2 ^.| = 4.1 ic anc ^ P = ^ ' C ^°E V ' v °- Here 
t = v/clogu. 

Thus we can reduce Pr[F] as 



Pr[F] < £ ■ (Pi[F' | s is good] • 1 + 1 • (ttfe^)) 

t-sec ~p~ 



< l ■ (l - ^w) 



2 1™* I 



< 



v 



<2v c 



c\ogv-v c clogv ■ c 2 log 2 v ■ \Z2v° 

1 1 1 



clog^-i^ 1 clog i/- ^W 2 - 1 c 2 log 2 v ■ V2v c / 2 - 2 
1 1 1 



if we choose the constant c = 4 



4^ 3 logz/ 4i/2flogi/ 16i/2 log 2 !/ 

We record the following lemma which follows from the discussion above. 

Lemma 13. With SE and SD described as above, the probability that a message m of length v is recovered 
from the stegosystem is at least 1 - (^r-q^ + 4 ^ logu + 16 J log , ~ ] 
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3.4 Putting it all together 



The objective of this section is to integrate the results of the paper. We first show that for a perfectly 
secure, perfectly sound steganography protocol, we need at least v bits of min-entropy in the communication 
channel to embed a message m Gr {0, 1} V . This analysis yields a lower bound on the length of the stegotext 
transmitted. Next, we show that for such a steganography protocol, we also need the length of the shared 
secret key k to be at least v bits. This gives us a lower bound on the number of random bits used by a 
steganography protocol. 

Consider the following experiment: Fix a family of channel distributions. Now, pick a message m Er 
{0,1}^ and fix the shared secret k, independent of the message m. Finally, fix the encoding Enc(k,m) 
and the deterministic decoding Dec(k, Enc(m)) algorithm. Alice encodes the message using the encoding 
procedure and transmits Enc(k,m) to Bob who decodes the message with no errors as the stego system 
is perfectly sound. Let us now show that the random variable Enc(k,m) has v bits of min-entropy. As 
the encoding function is a probabilistic function the random variable Enc(k,m) depends on the message 
m and the channel distribution. Notice that since the decoding procedure is a deterministic procedure, the 
distribution output by the decoding algorithm has the same min-entropy as that of its input distribution. 
Now, Bob recovers m from Enc(k,m) without any errors. Since there are v bits of min-entropy in the 
message source m 6jj {0, 1}", it follows that there are v bits of min-entropy in Enc(k,m) as well. Since 
the stegosystem is perfectly secure, the distribution Enc{k,m) is identical to the channel distribution. 
This would imply that the channel also has n bits of min-entropy. This establishes a lower bound on the 
communication complexity of a steganographic protocol. Suppose we have a channel distribution such that 
for all valid history, the min-entropy of the channel is 5, the above discussion shows that to transmit a 
message m Er {0, 1}^, the length of the stegotext transmitted is atlcast v/8 symbols. 

Let us now focus on the number of random bits in the shared secret k. As the stego system is sound, 
the decoding procedure reconstructs the message m perfectly Furthermore, since this is a perfectly secure 
stegosystem, the random variables m and Enc(k,m) are independent. The mutual information between m 
and Erie (fc, m) is zero. But, the mutual information between m and the random variable pair (Enc (fc, m) , k) 
is v as Bob is able to recover m from the pair {Enc (k, m) ,k). It follows therefore that k has v bits of min- 
entropy, i.e., k can be chosen uniformly from {0, 1} U . 

If we let t = S^ 1 ■ (clogu + 2 log (l/e sec ) + O (1)) for some constant c, the channel distribution C\ 
supported on {0, i}-5- 1 (cio gl ,+2iog(i/ esec )+o(i)) & hag a m i n _ cn t r0 py of at least k = clogi^ + 21og(l/e sec ) + 
O (1). To put this all together, the RRV strong-extractor is a function 

Eri:{0,l}"x{0,l}^{0,lf A 

where 



n = (T 1 • (clogi/ + 21og(l/e sec ) + 0(l))-b 

d = 0(\og 2 (5- 1 -{c\ogv + 21og(l/e sec ) + 0{l))-b)-\og(l/e sec )-\ogk) 
k = clog;y + 21og(l/e sec ) +0(1) 
A = 21og(l/e sec ) +0(1) and 
k — A = c log v 

This implies that our stegotext is of length cl p • S^ 1 ■ (c log v + 2 log (1/ e sec ) + O (1)) • b bits to embed 
v bits of message. Alice and Bob share v secret bits. They also use a public seed for the extractor of length 
d bits. 



4 A provably secure stegosystem for longer messages 

In this section we show how to apply the "one-time" stegosystem of Section 3.1 together with a pseudorandom 
number generator so that longer messages can be transmitted. 
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Definition 10. Let Ui denote the uniform distribution over {0, 1}'. A polynomial time deterministic program 
G is a pseudorandom generator (PRG) if the following conditions are satisfied: 

Variable output For all seeds x <G {0, 1}* and y £ N, \G(x, l v )\ = y and, furthermore, G(x, l y ) is a prefix 
o/G(x,F+ 1 ). 

Pseudorandomness For every polynomial p the set of random variables {G(Ui, l pl - l ^)}i^ is computation- 
ally indistinguishable from the uniform distribution U p ^ . 

Note that there is a procedure G' that if z = G(x, l y ) it holds that G(x, l y+y ) = G'(x, z, l v ) (i.e., if one 
maintains z, one can extract the y' bits that follow the hrst y bits without starting from the beginning). For 
a PRG G, if A is some statistical test, then we define the advantage of A over the PRNG as follows: 



Adv£(f) 



Pr [A(l) = 1] - Pr [A(l) 

-G(U it l*W) l^U pil) 



The insecurity of the PRNG G is then defined 

InSec^ 6 ^) = max A {Adv£(7)} . 

Note that typically in PRGs there is a procedure G' as well as the process G(x, l y ) produces some aux- 
iliary data aux y of small length so that the rightmost y' bits of G(x, l y+y ) may be sampled directly as 
G'(x,l y ,aux. y ). Consider now the following stegosystem S' = (SE' , SD') that can be used for arbitrary 
many and long messages and employs a PRG G and the one-time stegosystem (SK, SE, SD) of Section 3.1. 
The two players Alice and Bob, share a key of length I denoted by x. They also maintain a state iV that 
holds the number of bits that have been transmitted already as well the auxiliary information auxjy (initially 
empty). The function SE' is given input N, auxAr, x, to € {0, 1}" where m is the message to be transmitted. 
SE' in turn employs the PRG G to extract a number of bits n as follows k = G'(x, 1 K , aux^v)- The length k 
is selected to match the number of key bits that are required to transmit the message to using the one-time 
stegosystem of section 3.1. Once the key k is produced by the PRG the procedure SE' invokes the one-time 
stegosystem on input k, to, h. After the transmission is completed the history h, the count N , as well as the 
auxiliary PRG information auxjv are updated accordingly. The function SD' is defined in a straightforward 
way based on SD. 

Theorem 14. The stegosystem S' = (SE' , SD') is provably secure in the model of [3] (universally stegano- 
graphically secret against chosen hiddentext attacks); in particular 

InSecf ?(i, q, I) < lnSec PRG (t + j(£(l)), £(l) + polylog(i)) 

(where t is the time required by the adversary, q is the number of chosen hiddentext queries it makes, I is the 
total number of bits across all queries and "f(v) is the time required to simulate the SE' oracle for v bits). 
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